14. Text Files in Python
Text Files in Python
The first two minutes of the video below are dedicated to the glob library, which makes opening files with similar path structure (like our folder of Roger Ebert review text files) simple.
Text Files In Python 1
Quiz
So we have 88 Roger Ebert reviews to open and read, which you can see in the Jupyter Notebook dashboard below (click jupyter in the top lefthand corner to access the dashboard) in the ebert_reviews folder. If you want to work outside of the Udacity classroom, click this link to download a zipped version of that folder.
We'll need to loop to iterate through all of the files in this folder to open and read each, then extract the bits of text that we need as separate pieces of data:
- the first line, which is the movie title (to merge to the master dataset with)
- the second line, which is the review URL (not necessary for the word cloud but nice to have)
- everything from the third line onwards, which is the review text
The Jupyter Notebook below contains template code that:
- Creates an empty list, df_list , to which dictionaries will be appended. This list of dictionaries will eventually be converted to a pandas DataFrame (this is the most efficient way of building a DataFrame row by row ).
- Loops through each movie's Roger Ebert review text file in the ebert_reviews folder.
- Opens each text file using a path generated by glob and passes it into a file handle called file .
-
Creates a DataFrame called
df
by converting
df_list
using the
pd.DataFrame
constructor .
Your task is to extract the movie title, Roger Ebert review URL, and the review in each text file and append each trio as a dictionary to df_list .
The file methods required for this task are:
-
readline()
-
read()
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.
Workspace Information:
- Default file path:
- Workspace type: jupyter
- Opened files (when workspace is loaded): n/a
Solution
Text Files In Python 2